# Dense feature extraction

Vit Gopt 16 SigLIP2 384
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification
Text-to-Image
V
timm
1,953
1
Vit SO400M 16 SigLIP2 512
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, suitable for zero-shot image classification tasks
Text-to-Image
V
timm
1,191
4
Vit SO400M 16 SigLIP2 384
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks.
Text-to-Image
V
timm
106.30k
2
Vit SO400M 16 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification
Text-to-Image
V
timm
998
0
Vit L 16 SigLIP2 512
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
147
2
Vit L 16 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification
Text-to-Image
V
timm
888
0
Vit B 16 SigLIP2 512
Apache-2.0
A SigLIP 2 vision-language model trained on the WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
1,442
1
Vit B 16 SigLIP2 384
Apache-2.0
SigLIP 2 vision-language model trained on the WebLI dataset, suitable for zero-shot image classification tasks
Text-to-Image
V
timm
1,497
0
Vit B 32 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
691
0
Vit B 16 SigLIP2 256
Apache-2.0
SigLIP 2 vision-language model trained on the WebLI dataset, supporting zero-shot image classification tasks
Text-to-Image
V
timm
10.32k
4
Siglip2 So400m Patch14 384
Apache-2.0
SigLIP 2 is a vision-language model based on the SigLIP pre-training objective, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Image-to-Text Transformers
S
google
622.54k
20
Siglip2 So400m Patch14 224
Apache-2.0
SigLIP 2 is an improved multilingual vision-language encoder based on SigLIP, enhancing semantic understanding, localization, and dense feature extraction capabilities.
Image-to-Text Transformers
S
google
23.11k
0
Siglip2 Large Patch16 512
Apache-2.0
SigLIP 2 is an improved model based on SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Text-to-Image Transformers
S
google
4,416
8
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase